A Data Cleaning Model for Electric Power Big Data Based on Spark Framework

نویسندگان

  • Zhao-Yang Qu
  • Yong-Wen Wang
  • Chong Wang
  • Nan Qu
  • Jia Yan
چکیده

The data cleaning of electrical power big data can improve the correctness, the completeness, the consistency and the reliability of the data. Aiming at the difficulties of the extracting of the unified anomaly detection pattern and the low accuracy and continuity of the anomaly data correction in the process of the electrical power big data cleaning, the data cleaning model of the electrical power big data based on Spark is proposed. Firstly, the normal clusters and the corresponding boundary samples are obtained by the improved CURE clustering algorithm. Then, the anomaly data identification algorithm based on boundary samples is designed. Finally, the anomaly data modification is realized by using exponential weighting moving mean value. The high efficiency and accuracy is proved by the experiment of the data cleaning of the wind power generation monitoring data from the wind power station.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

InferSpark: Statistical Inference at Scale

The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the...

متن کامل

Two-stage Stochastic Programing Based on the Accelerated Benders Decomposition for Designing Power Network Design under Uncertainty

In this paper, a comprehensive mathematical model for designing an electric power supply chain network via considering preventive maintenance under risk of network failures is proposed. The risk of capacity disruption of the distribution network is handled via using a two-stage stochastic programming as a framework for modeling the optimization problem. An applied method of planning for the net...

متن کامل

Renewable Energy Integration in Distribution System -- Synchrophasor Sensor based Big Data Analysis, Visualization, and System Operation

With the increasing attention of the renewable energy implementation in distributed power system, the hybrid smart grid (SG) operation is mainly featured by distributed renewable generation, data visualization, data prediction in high accuracy and operation cost minimization. Due to the large volume of heterogeneous data provided by both the customer and the grid side, a big data visualization ...

متن کامل

A comparison on scalability for batch big data processing on Apache Spark and Apache Flink

*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...

متن کامل

Task-based programming in COMPSs to converge from HPC to big data

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact, and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016